Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译
Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications. Alternative to traditional numerical schemes, learning-based solvers utilize the representation power of DNNs to approximate the input-output relations in an automated manner. However, the lack of physics-in-the-loop often makes it difficult to construct a neural network solver that simultaneously achieves high accuracy, low computational burden, and interpretability. In this work, focusing on a class of evolutionary PDEs characterized by having decomposable operators, we show that the classical ``operator splitting'' numerical scheme of solving these equations can be exploited to design neural network architectures. This gives rise to a learning-based PDE solver, which we name Deep Operator-Splitting Network (DOSnet). Such non-black-box network design is constructed from the physical rules and operators governing the underlying dynamics contains learnable parameters, and is thus more flexible than the standard operator splitting scheme. Once trained, it enables the fast solution of the same type of PDEs. To validate the special structure inside DOSnet, we take the linear PDEs as the benchmark and give the mathematical explanation for the weight behavior. Furthermore, to demonstrate the advantages of our new AI-enhanced PDE solver, we train and validate it on several types of operator-decomposable differential equations. We also apply DOSnet to nonlinear Schr\"odinger equations (NLSE) which have important applications in the signal processing for modern optical fiber transmission systems, and experimental results show that our model has better accuracy and lower computational complexity than numerical schemes and the baseline DNNs.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Modality representation learning is an important problem for multimodal sentiment analysis (MSA), since the highly distinguishable representations can contribute to improving the analysis effect. Previous works of MSA have usually focused on multimodal fusion strategies, and the deep study of modal representation learning was given less attention. Recently, contrastive learning has been confirmed effective at endowing the learned representation with stronger discriminate ability. Inspired by this, we explore the improvement approaches of modality representation with contrastive learning in this study. To this end, we devise a three-stages framework with multi-view contrastive learning to refine representations for the specific objectives. At the first stage, for the improvement of unimodal representations, we employ the supervised contrastive learning to pull samples within the same class together while the other samples are pushed apart. At the second stage, a self-supervised contrastive learning is designed for the improvement of the distilled unimodal representations after cross-modal interaction. At last, we leverage again the supervised contrastive learning to enhance the fused multimodal representation. After all the contrast trainings, we next achieve the classification task based on frozen representations. We conduct experiments on three open datasets, and results show the advance of our model.
translated by 谷歌翻译
无监督的句子嵌入学习最近由对比度学习方法(例如SIMCSE)主导,该方法保持积极对相似,并将负面对拆开。对比操作旨在通过在积极实例之间最大化相互信息来保持尽可能多的信息,从而导致句子嵌入中的冗余信息。为了解决这个问题,我们提出了一个基于信息最小化的对比度学习(Informin-CL)模型,以保留有用的信息并通过最大化相互信息并最大程度地减少无监督句子表示学习的正面实例之间的信息熵,从而丢弃冗余信息。具体而言,我们发现信息最小化可以通过简单的对比度和重建目标来实现。重建操作通过另一个正实例重构积极实例,以最大程度地减少正实例之间的信息熵。我们在下游任务中评估了我们的模型,包括受监督和无监督的(语义文本相似性)任务。广泛的实验结果表明,我们的Informin-CL获得了最先进的性能。
translated by 谷歌翻译
提示将下游应用程序作为语言建模任务施放,与使用预训练的模型进行标准微调相比,已显示出样本有效的效率。但是,提示的一个陷阱是需要手动设计的模式,其结果可能是不直觉的,需要大量的验证集来调整。为了应对挑战,我们提出了一种全自动提示方法Autoseq:(1)我们在序列到序列模型上采用自然语言提示,从而实现自由形式生成和更大的标签搜索空间; (2)我们提出了标签序列 - 无限长度的短语以口头表达标签 - 这消除了手动模板的需求,并且比单个标签单词更具有表现力; (3)我们使用Beam Search自动生成大量的标签序列候选物,并提出对比度重新排列以获得最佳组合。 Autoseq显着胜过其他无手动设计方法,例如软提示调整,适配器调整和自动搜索单个标签单词;生成的标签序列比各种任务上的精选手动序列更好。我们的方法揭示了几次学习中序列模型的潜力,并阐明了通用通用和自动提示的途径。本文的源代码可以从https://github.com/thunlp/seq2seq-prompt获得。
translated by 谷歌翻译
尽管在情感分析方面取得了巨大的成功,但现有的神经模型在隐式情感分析中挣扎。这可能是由于它们可能会锁定虚假的相关性(例如,“捷径”,例如,仅关注明确的情感词),从而破坏了学习模型的有效性和鲁棒性。在这项工作中,我们提出了一种使用仪器变量(ISAIV)的因果干预模型,用于隐式情感分析。我们首先从因果角度审查情感分析,并分析此任务中存在的混杂因素。然后,我们引入了一个仪器变量,以消除混杂的因果效应,从而在句子和情感之间提取纯粹的因果效应。我们将所提出的ISAIV模型与几个强大的基线进行比较,同时是一般的隐式情感分析和基于方面的隐式情感分析任务。结果表明我们模型的巨大优势以及隐性情感推理的功效。
translated by 谷歌翻译
只有单个目标扬声器的语音供参考的单发语音转换(VC)已成为一个热门研究主题。现有作品通常会散布音色,而有关音高,节奏和内容的信息仍然混合在一起。为了进一步删除这些语音组件,有效地执行一声VC,我们采用随机重新采样用于音高和内容编码器,并使用互信息的各种对比对数比率上限和基于梯度反向层的对抗性相互信息学习来确保不同部分在训练过程中仅包含所需的分离表示的潜在空间。 VCTK数据集的实验显示该模型就自然性和智能性方面实现了一声VC的最新性能。此外,我们可以通过语音表示分离分别传递音色,音调和节奏的单发VC的特征。我们的代码,预训练的模型和演示可在https://im1eon.github.io/is2022-Srdvc/上获得。
translated by 谷歌翻译
语义细分是一种关键技术,涉及高分辨率遥感(HRS)图像的自动解释,并引起了遥感社区的广泛关注。由于其层次表示能力,深度卷积神经网络(DCNN)已成功应用于HRS图像语义分割任务。但是,对大量培训数据的严重依赖性以及对数据分布变化的敏感性严重限制了DCNNS在HRS图像的语义分割中的潜在应用。这项研究提出了一种新型的无监督域适应性语义分割网络(MemoryAdaptnet),用于HRS图像的语义分割。 MemoryAdaptnet构建了一种输出空间对抗学习方案,以弥合源域和目标域之间的域分布差异,并缩小域移位的影响。具体而言,我们嵌入了一个不变的特征内存模块来存储不变的域级上下文信息,因为从对抗学习获得的功能仅代表当前有限输入的变体特征。该模块由类别注意力驱动的不变域级上下文集合模块集成到当前伪不变功能,以进一步增强像素表示。基于熵的伪标签滤波策略用于更新当前目标图像的高额伪不变功能的内存模块。在三个跨域任务下进行的广泛实验表明,我们提出的记忆ADAPTNET非常优于最新方法。
translated by 谷歌翻译
关于车辆路径预测的推理是自动驾驶系统安全运行的必不可少的问题。有许多用于路径预测的研究工作。但是,其中大多数不使用车道信息,也不基于变压器体系结构。通过利用从配备自动驾驶车辆的传感器收集的不同类型的数据,我们提出了一个名为多模式变压器路径预测(MTPP)的路径预测系统,该系统旨在预测目标试剂的长期未来轨迹。为了实现更准确的路径预测,在我们的模型中采用了变压器体系结构。为了更好地利用车道信息,目标试剂不太可能采用与目标试剂相反的车道,因此被过滤掉。另外,将连续的车道块组合在一起,以确保车道输入足够长以进行路径预测。进行了广泛的评估,以显示使用Nuscene(现实世界中的轨迹预测数据集)的拟议系统的功效。
translated by 谷歌翻译